Galaxy Bias and Halo-Occupation Numbers from Large-Scale Clustering 
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We show that current surveys have at least as much signal to noise in higher-order statistics as 
in the power spectrum at weakly nonlinear scales. We discuss how one can use this information to 
determine the mean of the galaxy halo occupation distribution (HOD) using only large-scale infor- 
mation, through galaxy bias parameters determined from the galaxy bispectrum and trispectrum. 
After introducing an averaged, reasonably fast to evaluate, trispectrum estimator, we show that 
the expected errors on linear and quadratic bias parameters can be reduced by at least 20-40%. 
Also, the inclusion of the trispectrum information, which is sensitive to "three-dimensionality" of 
structures, helps significantly in constraining the mass dependence of the HOD mean. Our approach 
depends only on adequate modeling of the abundance and large-scale clustering of halos and thus 
is independent of details of how galaxies are distributed within halos. This provides a consistency 
check on the traditional approach of using two-point statistics down to small scales, which neces- 
sarily makes more assumptions. We present a detailed forecast of how well our approach can be 
carried out in the case of the SDSS. 



I. INTRODUCTION 



Galaxy clustering in current surveys is providing 
tight constraints on cosmological parameters and the 
nature of primordial fluctuations Q, • One of the 
major issues in obtaining constraints from galaxy 
surveys on cosmology is the relationship between 
galaxy clustering and the underlying dark matter. 
This "galaxy bias" can be best studied at large 
scales using higher-order statistics |jg, Q > such as the 
higher-order moments [E Q, the three-point func- 
tion [EIEI1E3, and the bis P ectrum El El 13 III 

In this paper we consider how well one can mea- 
sure the galaxy trispectrum, and how this can be 
used together with the bispectrum to place con- 
straints on galaxy bias at large scales. The trispec- 
trum is the lowest order statistic that is sensitive to 
the three-dimensional character of structures gener- 
ated by gravitational instability and thus is a nat- 
ural candidate to tell us interesting new informa- 
tion not contained in the power spectrum and bis- 
pectrum. Here we show that in surveys currently 
under completion, there will be enough signal to 
noise in the galaxy trispectrum to provide improved 
constraints over measurements of the bispectrum 
alone on galaxy linear and nonlinear bias param- 
eters. Four-point statistics have so far only been 
measured mainly at small scales in angular cata- 



logs EE EE E3, 

with a marginal detection of 
the trispectrum in redshift surveys [l9|. The use of 
the disconnected (Gaussian) part of the trispectrum 
(not the one that concerns us here) to probe pri- 
mordial non-Gaussianity is studied in 20] . Previous 
estimates of the accuracy of higher-order moments 
and three-point statistics expected in current sur- 
veys are given in [2III2I I5 I . 

Galaxy biasing is at present best connected to 
galaxy formation using the "halo model" , where 
galaxies are only present within dark matter halos 
in numbers prescribed by a halo occupation distribu- 
tion (HOD) and with a profile dictated by numerical 
simulations [1^ I1E EE I1E 11^ 13 ISl [13, 113, 
[34^ . In this language, it is possible to directly map 
the large-scale bias parameters into a probe of the 
mean of the HOD using only information about the 
mass function and large-scale bias of dark matter ha- 
los j2^|, which depends on better understood large- 
scale physics. In this paper we address how well one 
can constrain the mean of galaxy halo-occupation 
numbers from measurement of the bias parameters 
at large scales. 

Our approach to constrain the mean of the HOD 
is complementary to constraints based on mea- 
surements of two-point statistics down to small 
scales [IE HE E3, HE where more details need 
to be modeled, such as halo exclusion, galaxy dis- 
tribution profiles inside halos, velocity bias between 
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FIG. 1: Slices 50 Mpc/fe thick of a mock galaxy distribution obtained from an HOD fit in a ACDM model to 
the M r < —20 galaxy two-point function in SDSS (left) and a Rayleigh-Levy flight (right). Despite their obvious 
differences, these two distributions have the same two-point statistics, the differences seen are entirely due to those 
in higher-order correlations, see Fig. |5] 



galaxies and dark matter, and the second moment 
of the HOD distribution. Therefore, our method 
can provide a consistency check on these additional 
assumptions needed for small-scale studies. In ad- 
dition, because higher-oder statistics measure the 
large scale linear bias this breaks degeneracies be- 
tween bias, Q m and as which is necessary in or- 
der to predict the halo mass function and halo bias 
that maps constraints on bias into constraints on 
the mean of the HOD. Therefore, the halo statistics 
and the mean HOD are determined simultaneously. 
Other work has also obtained constraints on HOD 
parameters from higher-order statistics going down 
to small scales [2^, |4jJ EI , but our purpose here is 
to see how much can be done with large-scale infor- 
mation where only the simpler physics of halos plays 
a role. 

From a physical point of view, measuring three 
and four-point statistics at large scales gives a rather 
complete picture of the non-linear couplings induced 
during the formation of large-scale structures, and 



thus a fundamental test of gravitational instabil- 
ity |42j . From a purely statistical point of view, 
the richness in the dependence of the bispectrum 
and trispectrum on configuration of the points al- 
lows to disentangle the relative probabilities of elon- 
gated versus compact shapes (bispectrum) and pla- 
nar versus three-dimensional character of large-scale 
structures (trispectrum). 

That higher-order statistics can help to break de- 
generacies otherwise present should be of no sur- 
prise, although a visual example may illustrate the 
power in this method more clearly. Figure ^ shows 
two distributions that clearly look "very different" 
to the eye. The left panel shows a mock galaxy dis- 
tribution obtained from an HOD fit to the M < — 20 
galaxy two-point function in SDSS [3r1 | assuming a 
ACDM halo population. The right panel shows in- 
stead a Rayleigh-Levy flight O Hj, |45| with pa- 
rameters chosen to match the power spectrum of the 
previous distribution |68j . 

The left panel in Fig. El shows that indeed the 
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FIG. 2: The distributions in Fig. Q have the same power spectrum (left) but can be easily distinguished by their 
bispectrum and trispectrum (right). Square symbols correspond to the HOD galaxies, triangles to the Rayleigh-Levy 
flight. The bispectrum (Qb) and trispectrum (Qt) are for all shapes of triangles and "quads" (see section Hi Cll in 
the range 0.04 /iMpc -1 < k < 0.4 h Mpc" 1 , here binned into Nt = 170 and Nq = 203 configurations, respectively. 
The variations seen in Qb and Qt in the HOD galaxies are due to the dependence of higher-order correlations on 
the shape of the configuration, a reflection of the filamentary structure seen in the left panel in Fig. Q 



power spectra at all scales are very similar, and thus 
degenerate. That this can happen should not be 
too surprising, after all the two-point function (or 
power spectrum) only measures the average number 
of neighbors from a given object as a function of 
separation, a rather crude statistic. The right panel 
in Fig. shows that the two distributions are easily 
distinguished by their bispectrum (top) and trispec- 
trum (bottom) for essentially all configurations of 
points. The Rayleigh-Levy flight predicts Qb = 0.5 
and Qt — 0.75 independe nt (app roximately for Qt) 
of configuration and scale |4Jj E3] 69] ■ It is interest- 
ing to note that this model was proposed in the 70 's 
in response to the observational results from the Lick 
catalog that showed Qb,Qt being consistent with 
constants at small scales; this ruled out the previous 
incarnation of the halo model |4(| > where galax- 
ies populate identical halos with power-law profiles 
chosen to match two-point statistics. 



This paper is organized as follows. In the next 
section we briefly review the bispectrum and trispec- 
trum generated by gravitational instability at large 
scales, the effects on it of galaxy biasing and the 
estimators of the bispectrum and trispectrum. In 
section ITTT1 we discuss the determination of bias pa- 
rameters from galaxy surveys and compare the sig- 
nal to noise in higher-order statistics to that in the 
power spectrum. Finally, in section IWI we show how 
one can turn the constraints on bias parameters into 
constraints on the mean HOD. 
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II. THE LSS BISPECTRUM AND 
TRISPECTRUM 

A. Bispectrum and Trispectrum generated by 
Gravity at Large Scales 

In this paper we will assume the dark matter pri- 
mordial fluctuations to be Gaussian. The three- 
point function and the connected four-point func- 
tion observed in galaxy surveys will then be a con- 
sequence of gravitational instability and galaxy bi- 
asing. At the scales relevant for this study, we can 
work in Eulerian Perturbation Theory (EPT) tak- 
ing into account corrections to linear evolution 8 L 
of second-order for the bispectrum and up to third- 
order corrections for the trispectrum, 

(<Mk 2 )=Mki 2 ) P{h), (i) 

(<5k 1 £>k 2 <5k 3 )=^£>( k i23) B(k 1 ,k 2 ,k 3 ), (2) 
(<5 kl <5k 2 <5k 3 #k 4 )c = $d (ki234)? 1 (ki , k 2 , k 3 , k 4 ) , (3) 

where (...} c implies that only connected terms are 
included in the average, 

B(k u k 2 ,h) = 2P 2 (k 1 ,k 2 )P 1 P 2 + eye, (4) 

is the bispectrum, and the trispectrum can be split 
into two different contributions, T = T a + T& with 

T a = 4PiP 2 [P 13j F 2 (k 1 ,-k 1 3)F 2 (k 2 ,k 13 ) 

+ P u F 2 (k 1 ,-k 1A )F 2 (k 2 ,ku)]+ eje, (5) 
T b = [F 3 (ki,k 2 ,k3)+pcrm.]PiP 2 P 3 +cyc., (6) 

where P = P(fc i ),P 3 = Pflkj + kj|). It is useful to 
introduce the reduced bispectrum Qb and trispec- 
trum Qt-, 



Q B (fci,fc 2 ,fc 3 ) 



B(ki,k 2 , k 3 



PxP 2 + PiP 3 + P 2 P 3 



(7) 



Q r (k 1) k 2 ,k3,k 4 ). r ^' k ^) 

which have the advantage of being almost indepen- 
dent of scale and cosmological parameters such as 
fl m and erg. 

The F 2 and P 3 kernels describe the second and 
third order solutions in EPT, and can be written in 
terms of two fundamental mode-coupling functions, 

ki2 ■ ki 



a(ki,k 2 ) = 



k\ ' 



(9) 



/?(ki,k 2 ) = 



fc? 2 (ki-k 2 ) 

2 k-^ k<2 



(10) 



which represent the nonlinearities involved in mass 
and momentum conservation, respectively. The re- 
lationship between them and the kernels read, 



P 2 = ^(k^kaj + ^ki.kj) 



5 x ( k\ fc 2 
7 + 2 \k 2 + k[ 



7 X 



(11) 



where (x = ki ■ k 2 ) and 
7 

Ps = "(ki,k 23 ) P 2 (k 2 ,k 3 ) 

+ i/3(ki,ka3) G 2 (k 2 ,k 3 ) 

+ ic 2 (ki,k 2 ) [7a(k 12 ,k 3 ) + 2/3(k 12 ,k 3 )] , 

(12) 

where the kernel G 2 is obtained from P 2 in Eq. 
by replacing 5 by 3 and 2 by 4. We thus see that in a 
sense, the bispectrum and trispectrum have a rather 
"complete" information of large-scale clustering, in 
principle one could try to deduce a and [3 from B 
and T. We shall explore this possibility in future 
work 561. 



B. Galaxy Biasing at Large Scales 

Since gravity is the only long-range force in the 
problem, at large scales we can assume the bias to 
be local, therefore when smoothed over large enough 
scales (lcompared to dark matter halo sizes) the 
galaxy number density contrast and dark matter 
density contrast are related by Q| 



h, 2 
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(13) 



with b\, b 2 and 6 3 are constants, the bias parameters. 
The galaxy power spectrum at large scales is then 
given by P^ 9 \k) ~ bfP(k), while the bispectrum of 
the galaxy distribution can be expressed in terms of 
the dark matter bispectrum as 

B^(k u k 2 ,k 3 ) = bf B(ki,k 2 ,k 3 ) 

+ b\b 2 (PiP 2 +cyc.) (14) 
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while the reduced galaxy bispectrum Q^g is 



Jg) Qb , b 2 



(15) 



For the galaxy trispectrum it follows that 
T (g) = b 4 T (i) + b J^l T {2) 

+ T (3) + ^3 T ( 4 ) ^ 

where TW =T a + T b as defined in Eq. © and, 

r (J) - 4P 1 [P 2 (k 2 ,k 3 )P 2 P 3 +P 2 (k 2 ,-k 23 )P 2 P 23 

+ F 2 (k 3 , -k 23 )P 3 P 23 ] + 4P 2 [F 2 (k 1; k 3 )P x P 3 

+ F 2 (ki, -ki 3 )PiPi 3 + F 2 (k 3 , -k 13 )P 3 P 13 ] 

+ 4P 3 [F 2 (ki,k 2 )PiP 2 + F 2 (ki, -ki 2 )PiP 12 

+ P 2 (k 2 ,-ki 2 )P 2 Pi 2 ] + eye. [4 terms] (17) 

T 0) = APi p 2 ( Pi3 + p 14 ) + cyc _ [ 12 tcrms ] ( 18 ) 

P (4) = 6PiP 2 P 3 + eye. [4 terms] (19) 
The reduced trispectrum, in this case, reads 



1 



,(2) 



6 2 n(3) 



(20) 



where = pW/(PiP 2 P3 + eye.) and = 6 by 
definition. 



C. Bispectrum and Trispectrum estimators 

To discuss our particular implementation of an 
averaged trispectrum estimator we note that given 
Fourier coefficients a bispectrum estimator can be 
written as f49| 



B 



123 



1 ' 1 d 3 qi j d 3 q 2 j d 3 q 3 5 D (c[ 123 )5 cll 5 Ci2 5 cl3 , 



V B J kl 



fc 3 



(21) 



where the integration is over the bin defined by qi € 

L3 
V 



(h - 5k/2,ki + Sk/2), V f = k) = (2n) 3 /V is the 



volume of the fundamental cell in Fourier space, and 
V B = d 3 qi d 3 q 2 d 3 q 3 5 D {q 123 ) 



8tt 2 hk 2 k 3 5k 3 . 



(22) 



The corresponding variance is 



23 



V f ^- P tot (fc 1 )P tot (fc 2 )p ot (fc 3 ) 

VB 



(23) 



where sb — 6, 2, 1 for equilateral, isosceles and gen- 
eral triangles, respectively, and the total power spec- 
trum (accounting for shot noise), 



P tot (k) = P(k) + 



1 1 



(2tt) 3 n 



(24) 



Such a definition for an estimator is trivially ex- 
tended to the trispectrum. A particular configu- 
ration of the 4-point function is completely deter- 
mined given 6 parameters (N(N — l)/2 for the Ap- 
point case). These can be, for instance, the four 
lengths ki = |kj| for i = 1,2,3,4 plus the diagonals 
d\ = |ki — k 2 | and d 2 = |ki — k 3 |. The trispectrum 
estimator can then be written as 



T 



' d 3 qi ... j d 3 q 4 / d 3 px / d 3 p 2 S D {q 1234 ) 

ki J /c4 J d\ J d,2 



V t 

x S D (p 1 - qi + q 2 ) S D (p 2 - qi + Q3) 



(25) 



where the integrations are taken over bins which are 
spherical shells in Fourier space of thickness 8k, and 
Vt denotes the same integral as in the numerator 
but with Fourier coefficients replaced by one, as in 
the bispectrum case [see Eqs. l|2H22[l ]. However, in 
this work we will pursue a simpler approach, leaving 
the more detailed general case for a future paper. 
One can construct an "angle averaged" trispectrum 
that depends only on 4 variables, rather than 6 by 
removing the constraint on the diagonals, i.e. 

?1234 = t£ fd 3 qi- ■ ■ fd 3 qi Soiq^^S^SqJ^S^, 

(26) 



where V~ is given by 



V~ 



<d 3 qi 

Ik! 

= 8vr 3 (5fc 4 ki k 2 k 3 k 



d 3 q4: 5 D (qi 23i ) 



x ^fci + k 2 + k 3 + ki - \ki + k 2 - k 3 - fc 4 

-\ki - k 2 + k 3 - fc 4 | - \ki - k 2 - k 3 + fc 4 | 

(27) 
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We denote by "quad" a configuration with fixed 
fci> fe> &3) &4 that contributes to Eq. I|26[) . The vari- 
ance of T is simply, 

AT 2 = Vf g Ptot(ki)Ptot(k2)Ptot(k3)Ptot(k4), 

T 

(28) 

where st = 24, 6, 2, 1 if 4, 3, 2 or none of the fcj's are 
equal or St = 4 if we have two paired couples. 



III. DETERMINATION OF THE BIAS 
PARAMETERS 

A. Simple Estimates 

We would like here to obtain a simple estimate 
of the uncertainty on the bias parameters that we 
can expect from an analysis involving bispectrum 
and trispectrum measurements. This is not an easy 
task, since the signal to noise for the angle-averaged 
trispectrum is given by a complicated integral that 
can be hardly replaced by a single, even approxi- 
mate, number. What we can estimate there is the 
dependence on the survey volume and on the small- 
est scale included in the analysis, expressed in terms 
of A; max , the maximum value of the wave number in- 
cluded in the sums over the configurations. 
We consider here, as a simple example, the signal 
to noise due to the contribution from gravity to the 
bispectrum and trispectrum, i.e. that sensitive to 
the linear bias b±, corresponding to B and in 
Eqs. (|14fl and (|16fl . The total signal to noise in these 
components is then given by 



2 v [Buy 

~ ^ AB 2 

D triangles 



E 

quads 



giving a non-marginalized uncertainty on b\ 



Abf 



1 



(S/N) B ' 



1 



2(S/N) T ' 



(30) 



the factor of 2 enhancement for the trispectrum case 
is due to the fact that Qt is sensitive to b\, see 
Eq. (H]J), where we used that B 2 /AB 2 « Q 2 B /AQ 2 B 
and the same for Qt- In order to make an estimate, 
we can replace the triple and quadruple sums with 



a single sum over integers introducing a coefficient 
that takes into account the number of configura- 
tions having the maximum of the three or four sides 
equal to a certain k and replacing B(k\, k 2l ^3) and 
f^(k u k 2 ,k 3 ,k 4 ) with B(k) and f^\k) as given 
by representative configurations of side k. Assum- 
ing a bin in Fourier space 5k, we have the cumulative 
signal to noise, 



5* 



where k = i dk, i n 
and 



S 



[B(k)Y 



(31) 



=1 



AB 2 {ky 

/5k, N B (i) = *(i + l)/2 



T«(fc) 



AT 2 (k) 



(32) 



where Nx{i) = i(i + + 2)/6. Now, we use 

k 3 f P 3 (k) 



B^(k) ~3Q B P 2 (k), AB 2 {k) = 



87T 2 fc 3 (5fc) 3 ' 

(33) 

with Q b some representative amplitude of order one, 
and for the averaged trispectrum 



T (1) (fc) ~ C T P 3 (k), AT 2 (k) = 



kj P 4 (fc) 



327r 3 /c 5 (<5fc) 4 ' 
. (34) 

where the constant CV is difficult to determine in 
a simple way, being the result of complicated angu- 
lar integrations, it depends on the configuration of 
wavevectors and on the binning size dk. 

It is instructive to compare these estimates to the 
power spectrum case, 



AT 2 



(29) 



[P(k)f 



E AP 2 (fc) 



2i J 



/6k 
\k f 



Sky 



2V 
(2^)3 



(35) 

where we used AP 2 (k) = 2P 2 {k)/N l with N, = 
4iri 2 (5k/kf) 3 the number of Fourier modes in bin 
given by i, and V is the volume of the survey. For 
the bispectrum one gets instead, 



(36) 



G 



where A(fe) = 4irk 3 P(k), assuming Qb ~ 1 and an 
effective spectrai index n e ff ~ —1.5 this gives, 

(l)>^f)V-..)^L,^) 

(37) 

Comparing Eqs. (|35[) and 1)3 7|) we see that up to 
scales where A(fc max ) < 1 there should be more signal 
to noise in the bispectrum than the power spectrum. 
This simple estimate ignores shot noise and survey 
geometry, but we shall see that this conclusion re- 
mains true when these are included. 

To estimate the trispectrum signal to noise we 
would need to evaluate the constant Ct in Eq. I|34[) , 
we will do this implicitly in Fig. [3] below when we 
take properly into account the sum over all config- 
urations of the estimator in Eq. (|26ll . For our pur- 
pose here it is enough to note that we can approxi- 
mately reproduce the scaling in Fig.|3by using that 
Ct ~ Cxik/Sk) with Ct approximately constant. 
Then we have, 

We see that compared to the bispectrum, Eq. lj3~51) , 
the trispectrum is suppressed at the largest scales 
by a factor of A(/c max ), similar to what happens by 
going from the power spectrum to the bispectrum, 
Eq. 63 pl[i|. 

In addition there is a bin-size dependent factor 
due to the particular average we are doing in our 
trispectrum estimator, which results from the effect 
of the angular integration over the PT kernels. This 
bin-size dependent factor is only illustrative, as we 
mentioned above, since it is hard to estimate pre- 
cisely, but it is saying that by averaging the con- 
figuration dependence due to the PT kernels one is 
decreasing the signal to noise, we use below 5k = 3k f 
and therefore potentially we could gain a factor of 
three in signal to noise by doing "minimal" averag- 
ing, Sk — kf. 

In order to improve over Eq. 1)38(1. we show in the 
top three lines in Fig. [3| the result of computing 
the signal to noise for the power spectrum (solid), 
bispectrum (dashed) and trispectrum (dotted) by 



cxplictly doing the respective sums over configura- 
tions up to some maximum scale fc max , assuming 
an ideal (diagonal covariance matrix) survey with 
volume V = 0.3 h^ 1 Gpc 3 and a galaxy density 
n = 0.003 (h^Mpc)- 3 . 

We see from the top three lines in Fig. [3| that 
the signal to noise increases faster as a function 
of fc max for higher-order statistics, with the signal 
weighted more toward smaller scales. The effect of 
the shot noise is simple to estimate as well, for the 
N— point spectrum each scale is penalized by a fac- 
tor of [nPi/ (l + nPi)] N , so higher-order statistics get 
more penalized by poor sampling. However, in the 
case of Fig.|31shot noise is small enough to be almost 
unimportant. Given the estimates in Fig. 01 the ex- 
pected uncertainties in the linear bias from Eq. (|3l]t 
are A&f « 3 x 10~ 3 and Abf k 10~ 3 for this ideal 
geometry. 

We should note that the signal to noise figure of 
merit shown in Fig. [3] does not capture the full ex- 
tent of the statistical power in the bispectrum and 
trispectrum since there are many additional compo- 
nents in the presence of nonlinear bias, in fact this 
is the crucial point that leads to constraints on the 
mean of the HOD, see section IIVI The signal to 
noise contained in these additional terms depends 
on the type of galaxy, but note that nonlinear large- 
scale bias is inevitable in the framework of the halo 
model |25| , even if b\ ~ 1 it is very difficult to have 
h = h = 0, see Eq. (|43|l and Fig. below. 

B. Likelihood Analysis: Ideal Geometry 

Since most of the signal is coming from scales 
small compared to the size of the survey, it is rea- 
sonable to assume that the joint likelihood for the 
power spectrum, bispectrum and trispectrum will be 
Gaussian; this can be checked for a particular sur- 
vey geometry by simulating a large pool of mock 
catalogs UHl E3- We work with the reduced 
amplitudes Q b and Qt which to very good approx- 
imation are independent of cosmological parameters 
(e.g. £l m and and the amplitude of the power 
spectrum (as). The only remaining dependence on 
cosmology is through the shape of the power spec- 
trum, which we take to be fixed by power spectrum 
measurements here. This is only important for a 
relatively small number of configurations, when all 
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scales are of the same order the reduced amplitudes 
Qb and Qt becomes independent of the shape of 
the power spectrum. We will consider the more gen- 
eral case, where one simultaneously constraints the 
power spectrum, bispectrum and trispectrum, in fu- 
ture work. The joint Gaussian likelihood for Qb and 
Qt reads, 



21n£ 



const 



(Q°b s 



QT d y 



triangles 



(AQ£° d ) 2 



+ 



{Q 



obs 
T 



Qr d ) 



quads 



(AQ™ od ) 2 



(39) 



where Q^ od and Q^° d are given in terms of bias 
parameters by Eqs. (|15l) and H20[) . respectively, and 
by sum over "quads" we mean those average trispec- 
trum configurations that appear in our estimator, 
Eq. 1)26(1. In addition, we consider only those quads 
with vanishing non-connected component. In partic- 
ular we limit ourselves to the cases with k\ > k 2 > 
&3 > ki (with ki +k2 + k3 +k4 = 0). In principle we 
could include configurations with two or three equal 
wavevector lengths, but in the case of a non trivial 
survey geometry such configurations have a leakage 
from non-connected contributions due to the cou- 
pling of the Fourier modes by the survey window. 
In order to be conservative we restrict here as well 
to these "safe" configurations. 

Given the likelihood function in Eq. (|39(l , we com- 
pute the expected errors on the bias parameters 
using the various components of bispectrum and 
trispectrum as determined by Eulerian perturba- 
tion theory as described in the previous section, for 
surveys with ideal geometry and different volumes. 
We consider a ACDM cosmology with f2 TO = 0.27, 
erg = 0.82, corresponding to a non-linear scale of 
k NL ~ 0.3 Wpc" 1 . 

The results for three different volumes, V = 0.1, 
0.3 and 1 (h^ 1 Gpc) 3 , are given in table [I] Includ- 
ing the trispectrum helps reduce the error on the 
determination of b\ and b 2 roughly by about 20% 
when all scales up to k = 0.3 /iMpc -1 are included. 
Note that one can also determine an additional bias 
parameter b 3 that would not be possible otherwise. 
The error on this cubic bias is not nearly as small 
as for b\ and b 2 , this is almost certainly related to 
our averaged trispectrum estimator being far from 



100 




[h Mpc" 



FIG. 3: The top three lines show the expected cumu- 
lative (up to scale fe max ) signal-to-noise for power spec- 
trum, bispectrum and trispectrum from an ideal survey 
with volume V = 0.3 (ft -1 Gpc) 3 and a galaxy density 
n = 0.003 (/i _1 Mpc)" 3 . The bottom three lines show 
the same quantities for the case of the SDSS geome- 
try (see section [ill PI . including radial selection func- 
tion, redshift distortions, and the covariance matrix be- 
tween different band powers, triangles and quads. Note 
that there are additional contributions due to nonlinear 
bias in the bispectrum and trispectrum case not included 
here. 



optimal for detection of &3, in fact one can easily 
check that our average trispectrum is averaging out 
a significant part of the dependence on configuration 
shape present in the full trispectrum; improvement 
on this is left for future work. 

In Fig. we give the 68% confidence intervals for 
two bias parameters at a time marginalizing over 
the third, for the V = 0.3 ( h^ 1 Gpc) 3 case. 



C. Likelihood Analysis: SDSS forecast 

In this section we consider a realistic survey ge- 
ometry with the induced covariance matrix between 



8 



TABLE I: Marginalized errors (68% CL) on the bias 
parameters for three survey volumes determined using 
bispectrum and trispectrum alone and combined with 
km ax = 0.3, ftMpc" 1 . Volumes are in (/i^Gpc) 3 and 
densities in ( /t" 1 Mpc) -3 . 



V n 3 


Param. 


Bisp. 


Trisp. 


Combined 




Abi 


0.033 


0.030 


0.030 


i io -4 


Ab 2 


0.042 


0.040 


0.040 




Ab 3 




0.18 


0.18 




Abi 


0.0065 


0.0082 


0.0050 


0.3 3xl0 -3 


Ab 2 


0.0080 


0.012 


0.0066 




Ab 3 




0.064 


0.032 




Abi 


0.014 


0.025 


0.012 


0.11 10 -3 


Ab 2 


0.018 


0.039 


0.016 




Ab 3 




0.21 


0.078 



different configurations both for the bispectrum and 
the trispectrum. We also include redshift distor- 
tions, as calculated by second-order Lagrangian Per- 
turbation Theory (2LPT), see [H(J for a comparison 
of 2LPT against N-body simulations for the redshift- 
space bispectrum, we will present a similar compar- 
ison for the trispectrum in |56|. For biasing, we as- 
sume Eq. I|15[) and (|20() still hold in redshift space, 
which is a reasonable approximation near our fidu- 
cial unbiased model. 

We consider a survey geometry that approximates 
the north part of the SDSS survey, a 10, 400 square 
degree region [7(j ■ We don't include the South part 
of the survey in our analysis, which has a smaller 
volume and a nearly two-dimensional geometry that 
complicates the simplified bispectrum and trispec- 
trum analysis we will do below. For the radial selec- 
tion function we use that from |5^|, and we assume 
that the angular selection function is unity every- 
where inside the survey region, which is a very good 
approximation. 

The mock catalogs are the same we have used be- 
fore in [53|. Using a 2LPT code |H3 wi th about 
42 x 10 6 particles in a rectangular box of sides 
L l = 660, 990 and 1320 h~ x Mpc, we have cre- 
ated 6080 realizations of the survey geometry. In 
all cases, cosmological parameters are as in the 
ideal geometry analysis and b\ = 1, 62 = &3 = 
0. For each of these realizations, we have mea- 
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FIG. 4: Joint 68% confidence intervals marginalized over 
a single parameter for V = 0.3 ( ft -1 Gpc) 3 , galaxy den- 
sity n = 3 x 10 -3 (ft -1 Mpc)" 3 and fc max = 0.3 /iMpc" 1 . 



sured the redshift-space bispectrum and trispectrum 
for configurations of all shapes with sides between 
fcmin = 0.02 h Mpc -1 and /c max = 0.3 h Mpc" 1 , giv- 
ing a total of -ZVtriangies = 7.5 x 10 10 triangles and 
-Squads = 4.0 x 10 15 quads. These are binned into 
Nj- — 1015 triangles and Nq — 1720 quads with a 
bin size of Sk — 0.015 h Mpc -1 . The generation of 
each mock catalog takes about 15 minutes, and has 
about 5.7 x 10 5 galaxies. The redshift-space density 
field in each mock catalog is then weig hed using the 
FKP procedure [54|], see e.g. [5(1 l55j for a discus- 
sion in the bispectrum case. The results we present 
correspond to a weight Pq = 5000 ( h~ x Mpc) 3 . The 
bispectrum and the averaged trispectrum are then 
measured in each realization. The bispectrum takes 
about 2 minutes per realization, while the averaged 
trispectrum takes about 40 minutes |7l) . However 
the correction due to shot noise and geometry in the 
trispectrum case is nontrivial |56}. since it does not 
involve the power spectrum and bispectrum at the 
already measured configurations, but at more com- 
plicated configurations, e.g. involving k\i instead 



!) 



of fci , ki . Computing these additional correlators 
is time consuming, though still doable, doing so by 
"brute force" adds an additional 13 hours per real- 
ization. 

In order to generalize the discussion given in the 
previous section to the case of arbitrary survey 
geometry, we use the bispectrum and trispectrum 
eigenmodes q n , see 15(1 for a detailed discussion for 
the bispectrum and 51] for the three-point function 
case. The discussion is the same for both, here we 
only summarize it for the bispectrum. The eigen- 
modes can be written as a linear combination (cho- 
sen here to have zero mean), 



Imn—r-^. , (40) 

m=l ^ m 

where Q m = (Q m ), (AQ m ) 2 = {(Q m ~ Q m ) 2 ). By 
definition they diagonalize the bispectrum covari- 
ance matrix, 

(q n g m ) = >? n <W= (41) 
and have signal to noise, 




The eigenmodes are easy to interpret when or- 
dered in terms of their signal to noise [5(j ■ The best 
eigenmode (highest signal to noise), say n = 1, cor- 
responds to all weights j m i > 0; that is, it represents 
the overall amplitude of the bispectrum averaged 
over all triangles. The next eigenmode, n = 2, has 
7m2 > for nearly collinear triangles and j m 2 < 
for nearly equilateral triangles, thus it represents the 
dependence of the bispectrum on triangle shape. 

The same arguments hold for trispectrum eigen- 
modes. Here again the first eigenmode corresponds 
to the overall amplitude of Q~ averaged over all 
configurations while higher-order eigenmodes con- 
tain further information. Altough the average over 
the angles defining T in Eq. washes away a large 
part of the information contained in the trispectrum, 
we can still expect a different behavior from configu- 
rations with almost equal values for the fc^'s and con- 
figurations with, for instance k\ 3> k2,ks 1 kt±, where 



the average over the angles plays a little role. A more 
detailed analysis of the dependence of the trispec- 
trum (full and angle-averaged) is given in [56|. 

If the bispectrum and trispectrum likelihood func- 
tions are Gaussian, we can write down the likelihood 
for the bias parameters bj as, 

N T Nq 

« n ^[<f ({M)/Af]*n midbj})/^}, 
i=i i=i 

(43) 

where the Pi (x) are all equal and Gaussian with unit 
variance. We calculate the bispectrum Nt X Nt co- 
variance matrix and the trispectrum Nq x Nq co- 
variance matrix from our realizations of the survey 
and from that obtain the respective 7 mn and A„, 
which give the ingredients to implement Eq. 14311 . 
The results from such likelihood analysis are shown 
in table [n] for the marginalized errors on each bias 
parameter and Fig.[S]for the bivariate contours with 
the third parameter marginalized over. The results 
are given for separate and joint bispectrum (BS) and 
trispectrum (TS). It is interesting to note that com- 
pared to the ideal geometry case in the previous sec- 
tion, the trispectrum helps to reduce the errors by 
almost 40% here, twice as much as in the ideal ge- 
ometry case. Again we note that the poor deter- 
mination of 63 compared to b\ , 62 is likely due to 
the fact that the averaged trispectrum we use is not 
nearly as optimal as it could be if we use its full 
configuration dependence information. In addition, 
the trispectrum analysis by itself is expected to give 
similar accuracy regarding linear and quadratic bias 
parameters as the bispectrum. This can be used for 
a consistency checks of the results and sensitivity 
to scale dependence of the bias parameters, given 
that the bispectrum and trispectrum are sensitive 
to somewhat different scales. 



D. Comparison of Signal to Noise Against the 
Power Spectrum: Effects of Covariance 

We now go back to the question raised in sec- 
tion 1111 Al and Fig. [3] regarding the comparison be- 
tween the signal to noise in the power spectrum, bis- 
pectrum and trispectrum. We have measured the 
power spectrum from the same mock catalogs and 
calculated the signal to noise as a function of fc max 
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TABLE II: Marginalized errors (68% CL) on the bias 
parameters for SDSS geometry determined using bispec- 
trum and trispectrum alone and combined with fc ma x = 
0.3 hMpc" 1 . 



Parameter 


Bispectrum 


Trispectrum 


Combined 


A61 


0.036 


0.034 


0.024 


A62 


0.046 


0.047 


0.032 


A6 3 




0.24 


0.20 



SDSS Geometry 




0.96 1 1.04 -0.08 -0.04 0.04 0.08 

b, b, 

FIG. 5: Joint 68% confidence intervals for two bias pa- 
rameters at a time, with the third parameter marginal- 
ized over. We show results for the bispectrum (BS) 
only, trispectrum (TS) only, and a joint bispectrum plus 
trispectrum analysis. This assumes an SDSS geometry 
including a full covariance matrix for the bispectrum and 
trispectrum. 



from them for the power spectrum, bispectrum and 
trispectrum, including the effects of the covariance 
matrix for the SDSS geometry, always under the 
FKP approximation. 

The lower three lines in Fig. [3] show the results 



of such computation of the cumulative signal to 
noise [T^ • We see that including the covariance ma- 
trix degrades the averaged trispectrum the most, 
and the power spectrum the least, as expected. 
The degradation in the averaged trispectrum case 
is rather severe, still one should keep in mind that 
other contributions to the trispectrum and bispec- 
trum (due to nonlinear bias) have as much (or more) 
signal to noise than the gravity-only contribution 
displayed here. In any case, we see that higher- order 
statistics have comparable or larger signal to noise 
than the power spectrum at scales below the nonlin- 
ear scale, as expected from the simplified analysis in 
section fill Al 

So far we have expressed the information provided 
by higher-order statistics in terms of constraints on 
bias parameters, this is the most solid (least assump- 
tions) way of quantifying the information since it 
only assumes, basically, that gravity is the only long- 
range force in the problem. Now we discuss how 
these constraints can be turned into a probe of the 
way dark matter halos are populated with galaxies 
by making the additional assumption that we under- 
stand how to calculate the abundance of dark matter 
halos and their clustering at large scales. 

IV. FROM BIAS TO HOD PARAMETERS 

The halo model provides a very good tool to un- 
derstand galaxy biasing: in the first place the distri- 
bution of dark matter halos is related to the under- 
lying mass distribution (halo biasing) while the Halo 
Occupation Distribution (HOD) plus a radial profile 
prescribes how galaxies populate individual halos. 
While the halo distribution and halo-halo correla- 
tions can be studied and tested reliably in simula- 
tions, our understanding of galaxy clustering is still 
rather poor, since the non-gravitational processes in- 
volved in galaxy formation cannot be modelled ac- 
curately yet. Some of the details of how galaxies 
populate halos are now beginning to be explained in 
terms of gravitational physics (see e.g. [34| and refer- 
ences therein). Our purpose here is to see how much 
one can learn about the mean of the HOD using only 
large-scale information where the physics, standard 
gravitational instability, is better understood. 

We will assume the halo mass function to be the 
Sheth-Tormen (ST) mass function based on ellip- 
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soidal collapse EH EH , representing the average 
number density 71(777) of haloes of a given mass m 
per unit mass. The galaxy number density is then 
related to the halo mass function as 

n g = J dmn(m) (iV gal (m)), (44) 

where (N ga \(m)) is the mean of the HOD and it rep- 
resents the average number of galaxies in an halo of 
mass m. The galaxy bias parameters are then given, 
in the large-scale limit, by 

hi = — [ dm n(m) 6j(m) (N sa x(m)), (45) 
n g J 

where bi(m) for i = 1, 2, 3 are the halo large-scale 
bias parameters. They can be derived in the frame- 
work of non- linear perturbation theory and s phe rical 
collapse model and its extensions [6lL l62l I63L 164 165| : 
they correspond to the small 8 expansion of the con- 
ditional mass function n(m/5). In Fig. ||we plot 
&i(m), &2 an d 63(771) in the relevant range of 
masses. The large-scale limit implies that we can 
ignore the size of a single halo and consider it as 
point-like, that is, there is no need to know the pro- 
file of galaxies inside halos. Note that in Eq. (|45|l the 
bias parameters are related to halo abundance and 
clustering through the mean HOD, no other details 
about how galaxies populate halos is needed. The 
strategy of our approach is very simple: joint mea- 
surement of the power spectrum, bispectrum and 
trispectrum at large scales gives simultaneously the 
cosmological information to compute the halo abun- 
dance and bias, and the limits on bias parameters 
can be used to constraint the mean HOD through 

Eqs. gmui. 

We parametrize the mean HOD as 

!0 for m < M min 

1± / m ^, (46) 

where we have assumed that the average number of 
galaxies can be split into two contributions [3i|: a 
mean occupation number for a central galaxy, corre- 
sponding to (N cen ) = 1 for m > Mmin and (N cen ) = 
for lower masses, and a mean occupation number 
for satellite galaxies given by {N sat ) — (m/Mi) 13 . 
We fix M m i n to be a function of Mi and (3 in order 




b a (m)/2 / 



. ; / ~ 

-1 1- " • • b 3 (m)/6 / J 

-2 r ~ 
-3 r ~ 

I I I 1 I I 1 I 1 I I 1 I I 1 I I I I 1 I I 1 I I I 1 

13 14 15 

log 10 (m/(M o h-')) 

FIG. 6: The halo large-scale bias parameters as a func- 
tion of halo mass. 



TABLE III: Marginalized errors (68% CL) on the HOD 
parameters M\ and j3 for an ideal geometry survey 
with volume V = 0.3 (ft~ 1 Gpc) 3 and a galaxy num- 
ber density n = 0.003 (ft^Mpc)" 3 and for the SDSS 
geometry determined using bispectrum alone (in paren- 
thesis) and bispectrum and trispectrum combined with 
fc ma x = 0.3 ftMpc" 1 . 





Af3 


Alog 10 Mi 


Ideal 


0.0089 (0.0091) 


0.016 (0.022) 


SDSS 


0.059 (0.078) 


0.12 (0.15) 



to reproduce the galaxy density given by Eq. I|44|l. 
and then compute the joint likelihood function for 
Mi and (3 using Eq. (|45|l and the likelihood function 
of the galaxy bias parameters studied in the previous 
section. The specific parametrization in Eq. (??) is 
only chosen for illustration of our method, one could 
use other parametrizations. 

We study the same survey examples as above, 
an ideal (diagonal covariance matrix) survey with 
volume V = 0.3 (h^ 1 Gpc) 3 and a galaxy number 
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FIG. 7: 68% joint confidence intervals and marginal- 
ized errors for M\ and (3 for an ideal survey with V = 
0.3( /i -1 Gpc) 3 . The larger contour corresponds to the 
case where only the bispectrum is used in the analysis, 
the inner contour includes both bispectrum and trispec- 
trum. 



density n = 0.003 (/i^Mpc)" 3 , and the SDSS- 
North geometry which includes a full covariance ma- 
trix. The joint confidence intervals are presented 
in Figs. [7| and Q3 respectively. The fiducial values 
Mi = 7.5 x 10 12 M Q /h and (3 = 1 correspond to the 
values h = 0.985 and b 2 = -0.175 and b 3 = 0.297 
for the galaxy bias parameters. In table IIIII we 
give the expected marginalized errors on Mi and (3. 
Comparing Figs. [7| and 03 one can see that for the 
ideal geometry the introduction of the trispectrum 
and thus of cubic bias information significantly im- 
proves the determination of Mi, however this does 
not translate into a similar impact for the SDSS ge- 
ometry due to the effects of the covariance matrix 
(expected from Fig [3J) . In principle one should be 
able to recover a similar effect for the SDSS case by 
improving on the trispectrum estimator to get better 
constraints on b 3 ; Fig.EJshows that the cubic bias of 
halos is a rather different function of halo mass and 



FIG. 8: Same as Fig. [7| but with galaxy bias likelihood 
obtained from the SDSS geometry including a full bis- 
pectrum and trispectrum covariance matrix analysis. 



thus it helps to gain sensitivity on the mass scale 
Mi. 

The example from the SDSS geometry is some- 
what artificial since in a flux limited sample there is 
a contribution of a broad class of galaxies with differ- 
ent clustering properties, thus the effective bias and 
HOD parameters that one would obtain are not very 
meaningful. Therefore we also give results (with 
ideal geometry) for a series of volume limited sam- 
ples studied in (3|| , see Fig. |5J This gives an idea of 
how the errors on (3 and Mi depend on different sam- 
ples and ultimately on different mass ranges. Here 
we assume as maximum likelihood values for (3, Mi 
and number density those given in Table 3 of [3(|, 
while the volumes are rescaled from 2, 500 to 10, 400 
deg 2 . Table Hvl shows the marginalized errors on M i 
and (3 for the three subsamples. The best constraints 
are expected for the sample with M r < — 19 since it 
corresponds to the best combination of volume and 
galaxy number density. 

We can compare these results with those in |3(j 
obtained from the two-point function analysis down 
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1.1 



0.9 



0.8 




BS only 

BS + TS 



_L 



12.4 12.8 13.2 13.6 

Iog 10 (M,/(M o h->)) 

FIG. 9: Same as Fig.0but for volumes and densities cor- 
responding to three of the luminosity threshold samples 
studied in |3qfl. 



to small scales. Their marginalized errors 66| scaled 
to the final survey volume are better by about a fac- 
tor of two for A/3 and three for log Mi when com- 
pared to those in Table IIVI However, their results 
assume a fixed the cosmological model, and depend 
on further assumptions, e.g. about the galaxy pro- 
files inside halos, modeling of the second moment of 
the HOD and halo-halo exclusion. Our results in 
Table llVl do not include the covariance matrix, thus 
some degradation is expected. On the other hand, 
our trispectrum estimator can still be improved sig- 
nificantly. In the end, if further study shows that 
the sensitivity of our method does not compare well 
with the small-scale analysis approach, the interest 
in our method would be to provide an alternative 
way of probing the HOD using large-scale informa- 
tion that can validate the assumptions used in the 
small-scale analysis. 



TABLE IV: Marginalized errors (68% CL) on the HOD 
parameters Mi and (3 for ideal geometry surveys with 
volumes and densities corresponding to three of the lu- 
minosity threshold samples studied in [3(J. Volumes are 
in ( Gpc) 3 , densities in ( h~ x Mpc) -3 . In parenthesis 
we give the results from the bispectrum analysis alone. 



M rnax 


V 


n 


A/3 


Alog 10 Mi 


-20 


0.0065 


0.006 


0.066 (0.077) 


0.08 (0.14) 


-19 


0.0064 


0.015 


0.042 (0.044) 


0.08 (0.10) 


-18 


0.0013 


0.027 


0.069 (0.076) 


0.15 (0.20) 



V. CONCLUSIONS 

We showed that current surveys have at least as 
much signal to noise in higher-order statistics as in 
the power spectrum at weakly nonlinear scales, and 
studied the constraints on linear, quadratic and cu- 
bic galaxy bias from measurements of the bispec- 
trum and trispectrum at large scales. We introduced 
an averaged trispectrum which is relatively fast to 
compute in current galaxy surveys. We calculated 
the expected marginalized errors on the bias param- 
eters for surveys with ideal geometry of relevant sizes 
and galaxy densities as well as for the more realis- 
tic geometry of Sloan Digital Sky Survey. We have 
shown that the trispectrum analysis alone can give 
at least as good results as the bispectrum in the de- 
termination on the linear and quadratic bias, which 
can be used for consistency checks, while in addition 
providing constraints on the cubic bias parameter. 
The combined likelihood analysis of the bispectrum 
and trispectrum can improve the results of the bis- 
pectrum alone by about 30%. 

We also discussed how one can use the bispectrum 
and trispectrum information to determine the mean 
of the galaxy halo occupation distribution (HOD), 
subject only to adequate modeling of the abundance 
and large-scale clustering of halos and thus is inde- 
pendent of details of how galaxies are distributed 
within halos. This provides a novel way of mea- 
suring the way galaxies populate halos and gives a 
consistency check on the traditional approach of us- 
ing two-point statistics down to small scales, which 
necessarily makes more assumptions. 

Although our results are promising, a number of 
checks and improvements are required to understand 
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better the statistical power of these techniques. At 
the basic level, more work is needed to come up with 
a trispectrum estimator that is more sensitive to 
bias parameters and fast to evaluate, our attempt 
here was purely based on simplicity. Another issue 
is the validity of trispectrum results based on per- 
turbation theory; comparison against Y-body sim- 
ulations in real and redshift space will be addressed 
elsewhere |56j . In addition, our mapping from bias 
parameters to constraints of the HOD assume that 
the halo bias parameters are well described by the 
Sheth-Tormen conditional mass function, but there 
is currently no test of these predictions for 62 and 63 
against numerical simulations. We hope to report 
on this in the near future. 
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